Skip to content

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268

Draft
slow-J wants to merge 5 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets
Draft

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268
slow-J wants to merge 5 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets

Conversation

@slow-J

@slow-J slow-J commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Resolves #16249

Implementation heavily inspired by HistogramCollector.java.

Range faceting (in the sandbox module -LongRangeFacetCutter) currently reads the doc-values value for every matching document and binary-searches it into an elementary interval. When the faceted field is single-valued, we can use a doc-values skip index. For a dense skip block whose min and max values fall into the same elementary interval, every document in that block maps to that interval, allowing us to skip the per-doc value lookup and binary search.

Limitation - applies to single-valued, long fields only.

Benchmark (luceneutil)

I used my branch of https://github.com/slow-J/luceneutil/tree/github-16249-range-facet-bench which cherry picked 2 of @epotyom 's commits (mainly mikemccand/luceneutil#582 which adds range-facet support)

Setup:
runlocal.py, wikimediumall (33.3M docs), index-sorted by lastMod_skipper with
addDVSkippers=true. baseline = main, candidate = this change, both DURING_COLLECTION, so
the only difference is this optimization. 30 JVM iterations.

Command: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run-timing7.txt"

Edit: new benchmark results after the changes for Egors first 2 comments.
Edit2: new benchmark results after unwrapping removed

QPS

Task QPS baseline StdDev QPS modified StdDev Pct diff p-value
BrowseLastModOvlpRangeFacets 1.26 (7.7%) 2.72 (10.6%) 115.5% (90% - 145%) 0.000
BrowseLastModRangeFacets 2.21 (6.0%) 3.31 (8.8%) 50.0% (33% - 68%) 0.000
MedTermLastModOvlpRangeFacets 3.82 (13.5%) 5.48 (5.7%) 43.5% (21% - 72%) 0.000
MedTermLastModRangeFacets 4.15 (13.6%) 5.26 (7.9%) 26.5% (4% - 55%) 0.000
BrowseIDOvlpRangeFacets 1.21 (6.6%) 1.10 (6.7%) -9.6% (-21% - 4%) 0.000
BrowseIDRangeFacets 2.33 (8.6%) 2.57 (5.1%) 10.1% (-3% - 26%) 0.000
MedTermIDOvlpRangeFacets 3.79 (13.5%) 4.61 (11.1%) 21.6% (-2% - 53%) 0.000
MedTermIDRangeFacets 5.98 (4.6%) 5.92 (2.7%) -0.9% (-7% - 6%) 0.340

Latency (ms) — aggregated across all iterations

Task P50 B P50 C Diff P90 B P90 C Diff P99 B P99 C Diff P999 B P999 C Diff P100 B P100 C Diff
BrowseLastModOvlpRangeFacets 844.184 386.006 -54.3% 1437.289 581.094 -59.6% 7523.983 828.460 -89.0% 9510.480 868.764 -90.9% 9555.393 888.500 -90.7%
BrowseLastModRangeFacets 474.762 319.836 -32.6% 854.574 546.789 -36.0% 4412.829 781.421 -82.3% 7775.105 862.760 -88.9% 7910.258 893.449 -88.7%
MedTermLastModOvlpRangeFacets 286.226 187.654 -34.4% 552.668 436.448 -21.0% 771.820 599.881 -22.3% 1327.279 705.213 -46.9% 1445.804 707.766 -51.0%
MedTermLastModRangeFacets 260.932 200.115 -23.3% 652.004 510.872 -21.6% 847.848 635.331 -25.1% 2966.134 743.950 -74.9% 3060.317 745.647 -75.6%
BrowseIDOvlpRangeFacets 860.895 976.209 +13.4% 1419.693 1279.444 -9.9% 8271.185 1476.704 -82.1% 9919.502 1531.237 -84.6% 9928.280 1536.195 -84.5%
BrowseIDRangeFacets 461.967 404.593 -12.4% 799.144 625.845 -21.7% 5972.427 860.420 -85.6% 8963.973 930.259 -89.6% 9483.903 942.619 -90.1%
MedTermIDOvlpRangeFacets 294.831 235.198 -20.2% 676.861 539.088 -20.4% 897.009 671.736 -25.1% 1835.175 742.857 -59.5% 2055.182 744.089 -63.8%
MedTermIDRangeFacets 169.089 170.565 +0.9% 495.786 401.676 -19.0% 697.206 591.299 -15.2% 1026.169 690.797 -32.7% 1647.263 695.272 -57.8%

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 03d7d2a to 066c419 Compare June 17, 2026 16:03
@github-actions github-actions Bot added this to the 10.5.0 milestone Jun 17, 2026
@slow-J

slow-J commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

I reran benchmarks, this time correctly using localrun, and updated the results in #16268 (comment)

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 2e7144b to 0c72d5f Compare June 19, 2026 14:45

@epotyom epotyom left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! One suggestion below

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 1065433 to 7db2833 Compare June 23, 2026 10:39
@github-actions github-actions Bot modified the milestones: 10.5.0, 10.6.0 Jun 23, 2026
@slow-J slow-J requested a review from epotyom June 23, 2026 11:29
@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 7db2833 to 88fe293 Compare June 29, 2026 11:40
}

/** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
final LongValues skipFieldValues(LeafReaderContext context) throws IOException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method a part of the single value unwrapping logic that we want to remove for now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is part of the core skipper path.

@epotyom epotyom Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now. The initial logic was:

if single-valued AND has DocValuesSkipper:
  use the optimized version

Then in the second revision we tried:

if single-valued:
  if has skipper:
    use the skipper-optimized version
  else:
    use the single-valued optimized version

And now we are back to the first approach?

The reason I got confused is that I thought we also used the skipper optimization for multi-valued fields, but I see now that the description explicitly calls out single-valued fields only. I wonder why that is the case? I thought Lucene supported skippers for multi-valued fields as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify why I'm asking about multi valued fields: the current version still uses the unwrap-singleton optimization, but only together with the skipper. So there are basically two optimizations here, and I thought that was what you wanted to avoid?

If that’s the case, and if multi-valued fields support skippers, I’d suggest scoping this PR to the skipper optimization only, for both single- and multi-valued fields, and moving the single-valued unwrapping optimization to a follow-up PR. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm getting confused in the naming here. I have removed the change that was not related to skipper,

The unwrapSingleton call is used to detect the segment is single-valued and to get the single-valued NumericDocValues. The other change, now removed, caused single-valued segments with no skipper to always go to the single-valued cutter when it should depend on the existing logic.

Thanks for this and other comments, I'll look into adding multi-valued skipper when I get the time and do it in this PR.

…range-facets

# Conflicts:
#	lucene/CHANGES.txt
@slow-J slow-J marked this pull request as ready for review June 30, 2026 09:59
final LongValuesSource singleValues;

// Field name whose skip index is used on the single-valued path, or null when faceting a source.
final String skipField;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about renaming this to fieldName? skip is a bit confusing, since the field does not necessarily have a skipper even if this value is set.

}

/** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
final LongValues skipFieldValues(LeafReaderContext context) throws IOException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rename skipField to fieldName, maybe we should rename this method as well, perhaps to singletonFieldValues?

/** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
final LongValues skipFieldValues(LeafReaderContext context) throws IOException {
NumericDocValues values =
DocValues.unwrapSingleton(DocValues.getSortedNumeric(context.reader(), skipField));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deduplicate the calls to DocValues.getSortedNumeric and DocValues.unwrapSingleton? We call them both in maybeSkipper and here for the same field.

@slow-J slow-J marked this pull request as draft July 2, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we use DocValuesSkipper for range facets?

2 participants